Also for k-means: more data does not imply better performance

نویسندگان

چکیده

Abstract Arguably, a desirable feature of learner is that its performance gets better with an increasing amount training data, at least in expectation. This issue has received renewed attention recent years and some curious surprising findings have been reported on. In essence, these results show more data does actually not necessarily lead to improved performance—worse even, can deteriorate. Clustering, however, subjected such kind study up now. paper shows k -means clustering, ubiquitous technique machine learning mining, suffers from the same lack so-called monotonicity display deterioration expected set sizes. Our main, theoretical contributions prove 1-means clustering monotonic, while 2-means even weakly i.e., occurrence nonmonotonic behavior persists indefinitely, beyond any sample size. For larger , question remains open.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Does more data always yield better translations?

Nowadays, there are large amounts of data available to train statistical machine translation systems. However, it is not clear whether all the training data actually help or not. A system trained on a subset of such huge bilingual corpora might outperform the use of all the bilingual data. This paper studies such issues by analysing two training data selection techniques: one based on approxima...

متن کامل

Superintelligence Does Not Imply Benevolence

Asmachines become capable ofmore autonomous and intelligent behavior, will they also display more morally desirable behavior? Earth’s history tends to suggest that increasing intelligence, knowledge, and rationality will result in more cooperative and benevolent behavior. Animals with sophisticated nervous systems track and punish exploitative behavior, while rewarding cooperation. Humans form ...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

Count(q) Does Not Imply Count(p)

I solve a conjecture originally studied by M. Ajtai. It states that for different primes q, p the matching principles Count(q) and Count(p) are logically independent. I prove that this indeed is the case. Actually I show that Count(q) implies Count(p) exactly when each prime factor in p also is a factor in q. 1 The logic of elementary counting “She loves me, she loves me not, she loves me,. . ....

متن کامل

Does Level-k Behavior Imply Level-k Thinking?

I design an experiment to interpret the observed Lk behavior. It distinguishes between the “Lkb” players, who have high ability and best respond to Lk belief, and the “Lka” players, who could use, at most, k steps of reasoning, and thus could not respond to L(k+1) or higherorder belief. The separation utilizes a combination of simultaneous and sequential ring games. In the sequential games it r...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2023

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-023-06361-6